Variational Inference for Policy Search in changing situations
نویسنده
چکیده
Many policy search algorithms minimize the Kullback-Leibler (KL) divergence to a certain target distribution in order to fit their policy. The commonly used KL-divergence forces the resulting policy to be ’reward-attracted’. The policy tries to reproduce all positively rewarded experience while negative experience is neglected. However, the KL-divergence is not symmetric and we can also minimize the the reversed KL-divergence, which is typically used in variational inference. The policy now becomes ’cost-averse’. It tries to avoid reproducing any negatively-rewarded experience while maximizing exploration. Due to this ’cost-averseness’ of the policy, Variational Inference for Policy Search (VIP) has several interesting properties. It requires no kernelbandwith nor exploration rate, such settings are determined automatically by the inference. The algorithm meets the performance of state-of-theart methods while being applicable to simultaneously learning in multiple situations. We concentrate on using VIP for policy search in robotics. We apply our algorithm to learn dynamic counterbalancing of different kinds of pushes with human-like 2-link and 4-link robots.
منابع مشابه
Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning
In this paper we discuss very preliminary work on how we can reduce the variance in black box variational inference based on a framework that combines Monte Carlo with exhaustive search. We also discuss how Monte Carlo and exhaustive search can be combined to deal with infinite dimensional discrete spaces. Our method builds upon and extends a recently proposed algorithm that constructs stochast...
متن کاملNatural Gradients via the Variational Predictive Distribution
Variational inference transforms posterior inference into parametric optimization thereby enabling the use of latent variable models where it would otherwise be impractical. However, variational inference can be finicky when different variational parameters control variables that are strongly correlated under the model. Traditional natural gradients that use the variational approximation fail t...
متن کاملBlack-Box Policy Search with Probabilistic Programs
In this work we show how to represent policies as programs: that is, as stochastic simulators with tunable parameters. To learn the parameters of such policies we develop connections between black box variational inference and existing policy search approaches. We then explain how such learning can be implemented in a probabilistic programming system. Using our own novel implementation of such ...
متن کاملVariational Action Selection for Influence Diagrams
Influence diagrams provide a compact way to represent problems of decision making under uncertainty. As the number of variables in the problem increases, computing exact expectations and making optimal decisions becomes computationally intractable. A new method of action selection is presented, based on variational approximate inference. A policy is approximated where high-probability actions u...
متن کاملVariational probabilistic inference and the QMR - DT databaseTommi
We describe a variational approximation method for eecient inference in large-scale probabilistic models. Variational methods are deterministic procedures that provide approximations to marginal and conditional probabilities of interest. They provide alternatives to approximate inference methods based on stochastic sampling or search. We describe a variational approach to the problem of diagnos...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011